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Footnotes 

1. Stirling's Approximation: r(x + 1) = Jinx x 
order terms may be found in [Abramowitz && Stegun 
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dominant tenn of the asymptotic Bayesian test statistic was shown to be amutual information. 
The Bayesian method provided a means for comprehending both thehistorical test statistic and 
test procedure. The historical method is incomplete because it does notrigorously handle small 
samples. The Bayesian method explicitly states all assumptions, whereasthe historical method 
makes assumptions implicitly. The Bayesian method indicates the proper response to take when 
using the historical method and symbol counts of zero occur, and it provides a justification for 
placing the cutoff in the historical test only on the high end. In retrospect, it appearsthat the his- 
torical procedure relies heavily upon intuition to provide a useful teststatistic and test procedure, 
whereas the Bayesian method relies heavily on the subjective quantificationof prior knowledge. 
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When any of these assumptions are violated, we are forced to apply theBayesian test to gain any 
understanding whatever of the relative posterior probabilities ofhypotheses. In fact, should we 
desire information of the sort that Eqs. (10) and (11) supply, only the Bayesian procedure is rele- 
vant. 

The notion of optimality is clear for the Bayesian test. However, for the historical test the 
notion of Neyman-Pearson (NP) optimality is meaningless. To see this, recall that for fixed signif- 
icance a NP optimal test minimizes the supremum of the samplingprobability that the indepen- 
dent hypothesis is chosen given that the dependent hypothesis is true,where the supremum is 
taken as the underlying pf in the dependent hypothesis ranges over thedependent hypothesis 
space. Because it is always possible to choose a dependent hypothesis asclose to the independent 
hypothesis space as may be desired, any dependent hypothesis samplingdistribution continuous 
in the underlying pf 's may be brought as close as desired to any independent hypothesissampling 
distribution. Thus, for all tests, the supremum indicated above is thesame. Thus, no NP optimal 
test exists for the historical method. 

The Bayesian approach clarifies a confusion associated with thehistorical test. This confusion 
concerns whether the cutoff for the historical test should be on high values of the test statistic 
alone, or whether both a high cutoff and a low cutoff are reasonable [2]. Recall from the discus- 
sion of Sec. 1 that, in the context of the historical test procedure, thesignificance level sets a con- 
dition only on the probability of rejection of independence, givenindependence. The region of the 
sampling distribution chosen for acceptance is therefore arbitrary aslong as it excludes a rejection 
region, in probability, of the given significance. The Bayesian approach strongly argues that, for 
the particular hypothesis testing problem considered here, theacceptance cutoff should only be on 
the high end. 

Finally, note that there is a choice that must be made when the space of symbolsbeing sam- 
pled is continuous, that choice being the grouping into discrete symbolsthe continuous values that 
may appear. Clearly this choice is crucial for making either testing procedure giveuseful results. 
For both procedures there is no definitive method for making thisbinning choice. However, for 
the Bayesian procedure there is at least the certainty that theassumptions involved in the binning 
choice can be made explicit. A prior for the binning choicesmay beassigned, and all calculations 
for the joint distribution of one or more of hypothesis, pf, and binningchoice may be made 
exactly. 



7.0 Conclusion. 

The Bayesian and historical methods for hypothesis testing have beenpresented and dis- 
cussed. The results for the Bayesian method are exact and were found ina direct manner. The his- 
torical method uses several quantities of tenuous rigor, and these have been understood by relating 
them to the exact results of the Bayesian method in the asymptotic limitof large sample size. The 
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The approximation (Eqs. (14a)-(14d))) holds when all n^j, nj. , or n j are nonzero. When any of 
these are zero, simply remove them from the appropriate summationswhenever they occur, 
because Log(r(l)) = (see Eq. 13). Note that the highest order term is the mutualinformation 
between the estimated pf's n-j/N and (n-./N) (n,j/N) . Letting the true underlying pf be p*^, 
and assuming it factors so that p°- = pf pf , it is easily shown that in the asymptotic regime this 
term is closely related to Pearson's chi-squared statistic. Let A.. = n^-ZN - p-j, with similar defini- 



tions for A. 



and A j. In the asymptotic regime and when A.j « p-j (as would be typical in large 
samples), the highest order term in the approximation (Eq. 14a) is 



(n/N) 



j = i^VN^^"g ^n,/N)(n./N) ^ 
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We note that NL[' ?^ i^f-^ (pfp^j) is the Pearson chi-squared statistic, distributed as % 
other terms are chi-squared statistics distributed as ^ and j respectively, and a constant 
asymptotically proportional to the true mutual information. Because a %^ distribution has mean n 
and variance 2n [7], then, for sufficiently large values of r and s, it is possible to conclude that 
the dominant term in Eq. (15) is the X^^_i distributed term, I'^fj^PiJ' 



6.0 Comparisons of the tests. 

The calculations of the last section allow several definitivecomparisons of the Bayesian and 
historical tests to be made. We begin by interpreting the quantities that appear in the historic test 
in terms of Bayesian test quantities. 

In the comments at the end of the last section we noted that the dominantterm in the asymp- 
totic form of Log(CR(n)) arising in the Bayesian hypothesis test is predominantly equal to one 
half of the Pearson chi-squared statistic of the historical test, giventhat the dimensions rand sare 
sufficiently large and that the true hypothesis is a member of the independent set. Insofar as the 
Pearson chi-squared statistic is asymptotically equal to the historical teststatistic, we may con- 
clude that the historical test is loosely based on twice the logarithm of the minimum risk ( error) 
Bayesian test. The significance a is therefore loosely the probability that twice the logarithm of 
the Bayesian test statistic will exceed the cutoff value c of the historic test. Since the Bayesian 
test statistic is the ratio of the posterior probabilities of thedependent and independent hypothe- 
ses, we may conclude that the historical test is based on the same, inthe asymptotic regime, and 
for sufficiently large dimensions r and s. 

In the small sample regime or when the hypothesis is not necessarily amember of the inde- 
pendent set (the typical case, otherwise why bother testing atall?), the discussion of the last para- 
graph does not hold. Several constants have been neglected (the ratioof the priors, the dimension 
dependent term). The previous discussion also breaks down if thedimensions rand sare small. 
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The n independent factor, r(rs)/ (r(r)r(s)) , is dependent only upon the dimensions of the 
parameter spaces involved. When no observations have been made (N = 0) it is cancelled 
exactly by the factor r(N + r)r(N + s)/r(N + rs). In the next section we relate Eq. (12) to an 
estimated mutual information function and the historical chi-squaredtest. For now, note that if it 
is desired to minimize the risk of error (making an incorrecthypothesis choice) we would choose 
I or I based on the larger of P(II n) and P(II n). This leads to the Bayesian minimal error test 
procedure of choosing I if CR(n) < 1 , choosing I if CR(n) > 1 , and otherwise choosing ran- 
domly (with equal probability). 



5.0 Asymptotics of the Bayesian test and Mutual Information. 

The previous two sections described the historical (Sec. 3) and theBayesian (Sec. 4) tests for 
independence. The easily computed statistic CR(n) was identified as the important object in the 
Bayesian test. In this section we develop the asymptotic form of CR(n), show how the dominant 
term of this asymptotic form may be viewed as an estimated mutualinformation, and relate this 
asymptotic form to the chi-squared form in the historical test, which isby default an asymptotic 
test. 

Taking the logarithm of Eq. (12) gives us the sum of several terms.In finding the asymptotic 
form of Log(CR(n)) there are six basic terms which are easily approximated and summed,along 
with the N-independent terms. The six basic terms are the following: 

L[;?^ ,Log(nn,. + 1)), iLog(r(ni. + 1)), -EJ^ ^LogCnn.j + 1)), 

Log(r(N + r)), Log(r(N + s)), and -Log(r(N + rs)). (13) 
Using Stirling's formula^ and carrying out the approximations to o(l/N) shows that CR(n) is the 
sum of terms of various orders in N. We write these terms in decreasing order: 

Order N: NL[;|^ ^ (n,/N) Log(-^^^-^^^^ (14a) 

Order Log(N): 

L[;|^ iLog(njj) - iLog(n. ) - iLog(n.j) - (rs - r - s + 1/2) Log(N). (14b) 

Order 1 : ^ rs-r-s+l ^ Log(27U) + Log( j,^^) + Log(C) (14c) 



Order 1/N: 
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For the independent case in Eq. (3) we may rewrite the integral for P(nl I ) as 
P(nll )oc [Nj(jn[^^(p^^)"'5(L[^iPf-l)dp^) X (jm^^(p]')''5(LJ^jPj!'-l)dp''). (5) 
Similarly, for the dependent case in Eq. (3) we may write the integral for P(nl I ) as 

P(nir) oc J|nr,|^ iP:/5(E[;|^ ^p^^ - l)dp. (6) 

In Eqs. (5) and (6) note that, because of normalization ofthe priors, the N = values of the 
right hand sides of the proportionalities Eqs. (5) and (6)should be 1. The integrals in Eqs. (5) and 
(6) are found using convolution and Laplace transform techniquesdescribed in [5]. The integra- 
tion shows that the normalization constant for the independent prior is r(r)r(s), while that of the 
dependent prior is r(rs). Taking into account the normalization constants for the priors, the 
results for the hypothesis-conditioned sampling distributions are 

m^jECn. +i)nj^jr(nj + i) 



P(nll ) = r(r)r(s) 



f \ 
N 



and P(nir) = r(rs) 



f \ 
N 



r(N + r)r(N + s) 

n[;|=ir(n.j + i) 



(7a) 
(7b) 



r(N + rs) 

Let C = P(I)/P(I) and R(n) = P(nH )/P(nl I ) . Using Eq. (2), the posterior probabilities of the 
hypotheses may be rewritten in terms of known quantities as 

P(II n) = l/(l + CR(n)) . (8) 

Similarly, 

P(II n) = CR(n)/ (1 + CR(n)) . (9) 

It should be pointed out that the probabilities for the hypotheses giventhe data in Eqs. (8) and (9) 

are exact and hold for all N . 

Other desired quantities are the posterior probabilities that a pfoccurs given the observed data 
and hypothesis, P(pl n, I) , and P(p I n, I) . Using Bayes Theorem, write these in terms of known 
quantities (prior equation and Eqs. (4), (5)) as 

P(pl n,I) = P(nl p,I)P(plI )/P(nlI ), (10a) 

and similarly (prior equation and Eqs. (4), (6)) 

P(pl n,I) = P(nl p,I)P(pH )/P(nH ). (10b) 

Another quantity of interest is the probability of any pf given dataonly, which is written in terms 
of known quantities (Eqs. (8), (9), and (10)) as, 

P(pl n) = P(pl n,I)P(II n)-i-P(pl n, I)P(II n). (11) 
At this point it is of interest to examine the function R(n) . Explicitly, 

R(„)= "m^I^ ^r(N.r)r(N.s)^ nr^ 

m^^Hnj. + i)n|^^r(n.j + i) r(N + rs) r(r)r(s) 

Note that R(n) consists of a factor that is dependent upon n and a factor that is independent of n. 
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that the pf factors as in Sec. 2 (independence), and the secondthat it does not (dependence). 
Clearly, the independent hypothesis has lower dimensionality than the dependenthypothesis. 
Indeed, much as we would separate a line from a plane, we may look at thechoice of the two 
hypotheses as having divided a higher dimensional space, the space of allpf 's, into a lower 
dimensional subspace and a space with the same dimensionality as theoriginal. Let the prior 
probability of the independent hypothesis, I, be P(I), and similarly, let the prior probability of the 
dependent hypothesis, I, be P(I) . Clearly P(I) + P(I) = 1 . Further, let the prior on each hypothe- 
sis space of pf's be uniform, or constant, so that P(pll ) = Cj and P(pll ) = c^. The constants 
are such that each is normalized. Note that each prior is zero for pf's outside the respective 
hypothesis spaces. Because both of the hypothesis spaces contain pf's that are constrained by 
having components summing to one it is important to explicitly state theform of the priors 
P(pll ) and P(pll ). When the pf factors, the rs probabilities in each pf are specified by r + s 
underlying parameters, which in this problem represent marginal probabilities. The constraint 
^i'J= iPij ~ ^ reduces the number of independent parameters to r + s - 2. When the pf does not 
factor, there are rs parameters, with the constraint reducing the number of independentparame- 
ters to rs - 1 . For the independent case, we choose as the underlying parameters twopf 's, p'^and 
p , each of dimension r and s respectively, and incorporate the constraints that the components 
of the respective pf's sum to one using delta functions, so that 
P(pll ) = P(p^p''II )oc 5(i:[^ jpf- 1)5(ZJ^ ^pj'- 1). Similarly, for the dependent case we 
parameterize the space using p itself, so that P(pll ) <=< 5(Z['|^ iPij ~ 1) • It is not difficult to show 
that, with the priors chosen in this manner, the densities on the corresponding surfaces of con- 
straint are indeed constant [4]. The assumption of uniformity is driven bythe subjective desire to 
put as little knowledge as possible into the prior over the pf's, and to provide for simplicity of cal- 
culation. Uniformity is one requirement that we may drop: when the prioris not uniform the 
method for computing results in a manner similar to those presented inthis paper is presented in 
[6]. Related hypothesis testing work appears in [8]. 

A desired quantity is the posterior probability of independence given data, P(II n). From it, 
because there are only two mutually exclusive hypotheses, we find thatthe probability of depen- 
dence given data is P(II n) = 1 - P(II n). Using Bayes Theorem the posterior hypothesis prob- 
abihties are given by 

P(II n) = P(nll )P(I)/P(n), and P(II n) = P(nir)P(I)/P(n). (1) 
where P(nl I ) is the likelihood, or sampling distribution, given that hypothesis I is true, and sim- 
ilarly P(nl I ) is the likelihood given that hypothesis I is true. Again, because there are only two 
mutually exclusive hypotheses, 

P(n) = P(nl I )P(I) + P(nl I )P(I) . (2) 

Furthermore, 

P(nll ) = jP(nl p,I)P(plI )dp, and similarly P(nir) = jP(nl p, I)P(pir)dp, (3) 
where P(nl p, I) and P(nl p, I) are the likelihoods given fixed pf's in the respective hypotheses. 
The fixed pf likelihoods in Eq. (3) are multinomial distributions, with 

P(nl p,I) = [Njnr,s^^(papb)^^dP(nl p,I) = [Njnr.s^^p^. (4) 
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The historical testing approach considered in this section is thechi- squared test for indepen- 
dence. When the underlying pf is represented by a vector p , and observed counts are represented 
by a vector n, (both of dimension r), the chi-squared test is based upon the observation that for 
N — > oo the distribution of the statistic Q(n, p) = j (n^ - Np-) ^/Np- converges to a dis- 
tribution with r - 1 degrees of freedom (dof) [1] [2] [3] . (Explicitly, the %^ probability density 
function with n dof is given by y}pC) = 2"°^2r(n/2)x°^2" ^e"""^^, where x > 0.) (For small N 
the distribution of Q is somewhat different: we must examine the exact sampling distribution of n 
in order to find the sampling distribution of Q.) In the historical framework the hypotheses being 
tested are Hgip and Hjinot p. The decision to reject Hq with significance a g (0, 1) is based 
on the cumulative distribution function of Q: With c chosen so that P(Q > c) = a, Hq is rejected 
if Q > c, otherwise it is accepted. Qualitatively, all is well. The Q functional is proportional to a 
measure of the 'distance' of n/N from p. This is easily seen by rearranging Q as 
Q(n, p) E[ ^ J (nj/N - pj) ^/ p- , where the proportionality constant is N. For N — > <», the right- 
hand side of the proportionality converges to zero when the hypothesis is true, otherwise it con- 
verges to a nonzero value. With the factor of N present, it either converges to a %^ ^ distributed 
random variable (rv), or it diverges to infinity. The significance is the probability of incorrectly 
choosing the hypothesis H j (given Hq is true) for this test. Further, and perhaps most importantly, 
when p is not the true pf, the values of Q are increasing for N — > oo, and the test is based upon 
rejecting Hq for large Q. 

To apply the chi-squared test just outlined to the problem ofdetermination of independence, 
let the assumed hypotheses be Hpipy = Pj.p.j and H^ip^j ^ Pi.P.j- The quantities pj. and p.j are 
e^^/mateJ/rom ?/je<iato by their maximum likelihood values Pi . = nj./Nandp.j = n j/N.With 
the estimated marginal distributions found in this manner, the statistic 
Q(n, p) = Lf^ jL|^ J (njj - Npj.p.j) ^/Npj.p.j is asymptotically (N oo) distributed as a rv 
with (r-1) (s-1) dof. [2]. Using this %^ distribution, the test for a given significance goes 
through as before. There are several criticisms that must be made, but weleave these until Sec. 6 
where this historical testing method is compared to the Bayesian method. 

4.0 Bayesian test for independence. 

In the Bayesian procedure we are directly interested in, given theobserved data, the probabil- 
ities of independence and dependence of the underlying pf, as well as anyother quantities (aver- 
ages, uncertainties, etc.) that may be of interest. Later we interpretthe procedure in terms of the 
minimization of a risk. In the process of developing the Bayesianapproach to this problem we 
show that the important underlying quantity of interest is the mutualinformation function. 

The crux of the Bayesian approach is the choice of prior for the twohypotheses, a choice we 
necessarily make based on subjective measures of our knowledge of the underlying pf, and with 
considerations made for assuring the calculability of results under ourassumptions. The prior 
makes explicit all assumptions involved. With this in mind, consider the two hypotheses, the first 
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testing framework and from the Bayesian hypothesis testing framework,assuming an optimality 
criterion and test, or prior and risk, as appropriate to the method. Bycomparing the Bayesian 
hypothesis testing procedure to the historical hypothesis testingprocedure, we make clear some of 
the assumptions that the historical hypothesis testing procedureimplicitly makes. For instance, 
we demonstrate that the notions of optimality in the two frameworks arequite different, with the 
notion of Neyman-Pearson optimality being vacuous for the problem underconsideration. Most 
importantly, we show that the problem of testing for independence is easilyformulated within the 
Bayesian framework. In the Bayesian framework all assumptions areexplicit and there is no need 
to improvise a testing procedure. Indeed, all quantities of interest arecalculable within the Baye- 
sian framework, while it is not clear what quantities are calculable inthe historical framework. 
Also, the Bayesian approach immediately gives a result useful for allsample sizes, whereas the 
historical procedure does not apply when the sample size is small. Clearbenefits of the historical 
procedure are 1) no need to consider hypothesis probabilities. (Inmany cases this is not just of 
theoretical importance the dimensionality of the spaces involved make ita necessity, but this is 
not the case here.) Clear benefits of the Bayesian method are theability to 1) quantify all assump- 
tions, 2) understand sensitivity to the assumptions, 3) rigorouslydetermine probabilities of 
hypotheses given data and assumptions, and 4) rigorously understandthe uncertainties involved. 
We also note that with the Bayesian method there is 5) avoid discoveryand analysis of an appro- 
priate testing procedure. 

2.0 Definition of the problem. 

The problem we are considering is that of determining whether the joint probability function 
(pf) P(X, Y) is independent given a sample of N observations (ordered pairs of values (X, Y) ) 
drawn from this pf. A joint pf is independent when the probability ofseeing a certain value of X 
is independent of the value of Y (or similarly with Y X). This definition is easily shown to be 
equivalent to the pf being factorable as P(X, Y) = P(X)P(Y), where P(X=Xi) = ZJ^ ^P(Xj, yj) 
and similarly P(Y=yj) = Yj)- indicated, the indices i, j are assumed to run from 1 

to integers r and s respectively. That is, there are r distinct possible values {symbols) of X and 
similarly s distinct values of Y.) If a joint pf is not independent then it is dependent. For brevity 
we use the notation pjj = P(X=Xj, Y=yj), pj. =P(X=Xj) and p.j = P(Y=yj). Since we will be 
considering observed data, the notation n^ indicates the number of observations with 
X = Xj, Y = y-, and similarly Uj. =S^^ ^nj: and n.- = '^'^- ^ ,n--. The constraints Lf'?^ = N 
and H^'^ r P:; = 1 are immediate and are assumed to hold through all that follows. At timeswe 
consider vectors representing pf's and the corresponding observed data and denote their compo- 
nents pj and Uj respectively. Whenever possible, and depending upon the context, we represent 
vectors or matrices in bold type, e.g. p = (p-) or p = (pjj) . 

3.0 Historical test for independence using chi-squared. 
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procedure, given data and several hypotheses to be tested for, tests each hypothesis, chooses the 
hypothesis by examining the test values, and provides a measure of thequality of the test by pro- 
viding the level of significance of each hypothesis. Furtherdevelopment of the hypothesis testing 
framework considers the notion of optimality of a test. For the two hypothesis case, the Neyman- 
Pearson optimal test with specified significance level is any test that minimizes theprobability 
that the hypothesis is accepted given that it is not true, and has asignificance level not surpassing 
the specified significance level. It is not difficult to show that all Neyman-Pearson optimal tests 
for simple hypotheses are of the form of a direct comparison of the ratioof the likelihoods for 
each of the hypotheses to a fixed value. 

The historical hypothesis testing procedure ignores the probabilities ofoccurrence for each of 
the hypotheses and of the various parameter vectors of the hypotheses. The procedure implicitly 
assumes that each hypothesis occurs with some fixed probability andthat the parameter vectors 
for each hypothesis occur with some fixed probability, but does not make it clear what these 
assumptions are. The important indicator that the probabilities ofoccurrence for each of the 
hypotheses are ignored is that everything is based on the samplingdistribution, or likelihood. 

Another framework exists for hypothesis testing that does not makeimplicit assumptions 
about the probabilities of occurrence of hypotheses and parametervectors. In the Bayesian frame- 
work [3], a prior distribution is chosen that quantifies how the various hypothesesoccur. Instead 
of grouping all parameter vectors in a given hypothesis into a group withunclear probabilities of 
occurrence, the prior quantifies the probability of occurrence of theparameter vectors. Further, in 
the Bayesian framework all hypothesis choices are based upon a risk function: choose the hypoth- 
esis that minimizes the risk. To see why this is important, consider the case where there are two 
simple hypotheses, but it is known that one is far more probable than theother. Suppose that both 
hypotheses explain the data equally well (where 'explaining equally well' meansthat both likeli- 
hood functions are equal). The historical testing procedure with astrict likelihood-based test will 
choose one of the hypotheses at random (because they have equallikelihoods), while the Bayes 
procedure, which considers the probabilities of the hypotheses andminimizes a risk that is the 
probability of error of choice, will choose the hypothesis that is mostprobable. In many cases this 
is the desirable choice. In other cases there may be such a high riskassociated with the incorrect 
choice that the Bayesian method would choose the low-probabilityhypothesis, even under the 
conditions outlined above. Another reason it is important to understandthe assumptions com- 
pletely is that often a procedure is desired that can choose betweenhypotheses parameterized, say, 
by regions of R*^, with a different n for each hypothesis. (A case of this form is being considered 
in the rest of this paper.) Without clearly specifying the prior, it is difficult to understand how any 
hypothesis involving a parameter space of lower dimension could befavored over a parameter 
space of higher dimension, especially if the union of the disjoint spacesof lower dimension and 
higher dimension covers the space of probability functions. 

In what follows we examine a specific hypothesis testing problem -that of testing for indepen- 
dence of an underlying joint probability distribution based on a finitesample of observed data 
drawn from that distribution. We will consider the problem from both the historical hypothesis 
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0.0 Abstract. 

The problem of hypothesis testing is examined from both the historicaland the Bayesian 
points of view in the case that sampling is from an underlying jointprobability distribution and 
the hypotheses tested for are those of independence and dependence of theunderlying distribu- 
tion. Exact results for the Bayesian method are provided. AsymptoticBayesian results and histor- 
ical method quantities are compared, and historical method quantities areinterpreted in terms of 
clearly defined Bayesian quantities. The asymptotic Bayesian testrelies upon a statistic that is 
predominantly mutual information. 

1.0 Introduction. 

Problems of hypothesis testing arise ubiquitously in situations whereobserved data is pro- 
duced by an unknown process and the question is asked "From whatprocess did this observed 
data arise?" Historically, the hypothesis testing problem is approached from the point of view of 
sampling, whereby several fixed hypotheses to be tested for are given, and allmeasures of the test 
and its quality are found directly from the likelihood, i.e. by what amounts to sampling the likeli- 
hood [2] [3]. (To be specific, a hypothesis is a set of possible parameter vectors, eachparameter 
vector completely specifying a sampling distribution. A simple hypothesis is a hypothesis set that 
contains one parameter vector. A composite hypothesis occurs when the (nonempty) hypothesis 
set is not a single parameter vector.) Generally, the test procedure chooses as true the hypothesis 
that gives the largest test value, although the notion of procedure is not specific andmay refer to 
any method for choosing the hypothesis given the test values. Since it isof interest to quantify the 
quality of the test, a level of significance is generated, this level being the probability that, under 
the chosen hypothesis and test procedure, an incorrect hypothesis choiceis made. The signifi- 
cance is generated using the sampling distribution, or likelihood. Forsimple hypotheses the level 
of significance is found using the single parameter value of thehypothesis. When a test is applied 
in the case of a composite hypothesis, a size for the test is found that is given by the supremum 
probability (ranging over the parameter vectors in the hypothesisset) that under the chosen 
hypothesis an incorrect hypothesis choice is made. To summarize, the historical hypothesis testing 
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